Text Data Mining with Optimized Pattern Discovery

نویسنده

  • Hiroki Arimura
چکیده

This paper describes an application of the optimized pattern discovery framework to text and Web mining. In particular, we introduce a class of simple combinatorial patterns over phrases, called proximity phrase association patterns, and consider the problem of nding the patterns that optimizes a given statistical measure in a large collection of unstructured texts. For this class of patterns, we develop fast and robust text mining algorithms based on techniques from computational geometry and string matching. Then, we made experiments on large collections of documents and on Web pages to evaluate the proposed method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Data Mining: Discovery of Important Keywords in the Cyberspace

This paper describes applications of the optimized pattern discovery framework to text and Web mining. In particular, we introduce a class of simple combinatorial patterns over phrases, called proximity phrase association patterns, and consider the problem of finding the patterns that optimize a given statistical measure within the whole class of patterns in a large collection of unstructured t...

متن کامل

Efficient Text and Semi-structured Data Mining: Knowledge Discovery in the Cyberspace

This paper describes applications of the optimized pattern discovery framework to text and Web mining. In particular, we introduce a class of simple combinatorial patterns over texts such as proximity phrase association patterns and ordered and unordered tree patterns modeling unstructured texts and semi-structured data on the Web. Then, we consider the problem of finding the patterns that opti...

متن کامل

HARIALGM: Knowledge Discovery and Data Mining in Pedagogy with DNA Finger Printing

Knowledge Discovery and Data Mining (KDD) is a multidisciplinary area focusing upon methodologies for extracting useful knowledge from data and there are several useful KDD tools to extract the knowledge. The ongoing rapid growth of online data due to the Internet and the widespread use of databases have created an immense need for KDD methodologies. The challenge of extracting knowledge from d...

متن کامل

A Survey on Web Log Mining Pattern Discovery

web is a great source of information and knowledge, where a numerous of users find their interest. The data available is in form of structured (relational) and text data. Therefore, different kinds of data model can be implementable with web data for pattern discovery. Web mining is a data mining tool where the web related data is evaluated for pattern discovery and user navigation pattern. Add...

متن کامل

Web Usage Mining Tools & Techniques: A Survey

--The Quest for knowledge has led to new discoveries and invention. That leads to amelioration of various technologies. As years passed World Wide Web became overloaded with information and it became hard to retrieve data according to the need .Web mining came as a violence to provide solution of above problem. Web usage mining is category of web mining. Web usage mining mainly circulation with...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000